Phonetic speaker recognition using maximum-likelihood binary-decision tree models

نویسندگان

  • Jirí Navrátil
  • Qin Jin
  • Walter D. Andrews
  • Joseph P. Campbell
چکیده

Recent work in phonetic speaker recognition has shown that modeling phone sequences using n-grams is a viable and effective approach to speaker recognition, primarily aiming at capturing speaker-dependent pronunciation and also word usage. This paper describes a method involving binary-tree-structured statistical models for extending the phonetic context beyond that of standard n-grams (particularly bigrams) by exploiting statistical dependencies within a longer sequence window without exponentially increasing the model complexity, as is the case with n-grams. Two ways of dealing with data sparsity are also studied; namely, model adaptation and a recursive bottom-up smoothing of symbol distributions. Results obtained under a variety of experimental conditions using the NIST 2001 Speaker Recognition Extended Data Task indicate consistent improvements in equal-error rate performance as compared to standard bigram models. The described approach confirms the relevance of long phonetic context in phonetic speaker recognition and represents an intermediate stage between short phone context and word-level modeling without the need for any lexical knowledge, which suggests its language independence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive decision tree-based phone cluster models for speaker clustering

This study presents an approach to speaker clustering using adaptive decision tree-based phone cluster models (DT-PCMs). First, a large broadcast news database is used to train a set of phone models for universal speakers. The multi-space probability distributed-hidden Markov model (MSD-HMM) is adopted for phone modeling. Confusing phone models are merged into phone clusters. Next, for each sta...

متن کامل

Probabilistic state clustering using conditional random field for context-dependent acoustic modelling

Hidden Markov Models are widely used in speech recognition systems. Due to the co-articulation effects of continuous speech, context-dependent models have been found to yield performance improvements. One major issue with contextdependent acoustic modelling is the robust parameter estimation of unseen or rare models in the training data. Typically, decision tree state clustering is used to ensu...

متن کامل

Comparison of Artificial Neural Network, Decision Tree and Bayesian Network Models in Regional Flood Frequency Analysis using L-moments and Maximum Likelihood Methods in Karkheh and Karun Watersheds

Proper flood discharge forecasting is significant for the design of hydraulic structures, reducing the risk of failure, and minimizing downstream environmental damage. The objective of this study was to investigate the application of machine learning methods in Regional Flood Frequency Analysis (RFFA). To achieve this goal, 18 physiographic, climatic, lithological, and land use parameters were ...

متن کامل

A discriminative splitting criterion for phonetic decision trees

Phonetic decision trees are a key concept in acoustic modeling for large vocabulary continuous speech recognition. Although discriminative training has become a major line of research in speech recognition and all state-of-the-art acoustic models are trained discriminatively, the conventional phonetic decision tree approach still relies on the maximum likelihood principle. In this paper we deve...

متن کامل

Towards a non-parametric acoustic model: an acoustic decision tree for observation probability calculation

Modern automatic speech recognition systems use Gaussian mixture models (GMM) on acoustic observations to model the probability of producing a given observation under any one of many hidden discrete phonetic states. This paper investigates the feasibility of using an acoustic decision tree to directly model these probabilities. Unlike the more common phonetic decision tree, which asks questions...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003